Road accidents in New York

This analysis aims to provide some insight about the car crashes in New York during the period of 2016 to 2022, focusing on some key factors such as vehicle type, hour and weather.

Files

We count with 2 datasets, one cointaining the vehicle crashes and the second one about the weather of the location.

Name Rows Columns Each row is a Link
Vehicles dataset 2.11M 29 Motor Vehicle Collision Vehicles dataset
Weather dataset 59,760 10 Time stamp of Weather Weather dataset

Data Preparation

In this step we will collect the data from the datasets, clean it and merge it to create a comprehensive dataset for analysis. Before merging, the data needs to be cleaned and enriched with additional information. Also shrinking the data to make it more manageable.

Data Cleaning

The data cleaning process involves removing columns with high NA ratios, filtering out rows with missing values, and creating new columns to categorize the main causes of accidents. The data is then enriched with additional information such as the day of the week, month, quarter, year, and time of day.

Both datasets, specially vehicles.csv, contain a lot of rows which require a lot of memory and time to process. For this reason, we decide to eliminate rows with missing values and columns with high NA ratios, such as vehicle type 3, 4 and 5 as there are very few values in these columns (multiple vehicle accidents).

Weather.xlsx is a smaller dataset and the cleaning process is simpler, we just need to convert the time column to a correct format and rename some columns for better understanding. Finally the data is then merged with the weather data to create a comprehensive dataset for analysis.

Merging Dataframes

At this point we have two clean datasets, one with the vehicle crashes and the other with the weather data. Vehicle crashes are reduced considerably in size to about half the rows. We will merge them into a single dataset to perform the analysis.

The result is:

Name Rows Columns Each row is a
Merged dataset 1M 40 Combination of vehicle and weather data

After merging the data, we will save each year to a separate file to make it easier to analyze the data by year. We will perform both analyses on the full dataset and on years separately.

Analysis

Total Accidents in New York

In this graph we can see the total number of accidents per year in New York from 2016 to 2022. The number of accidents seems to be decreasing over the years, which is a positive trend.

It is important to note that we cannot draw any conclusions from this graph alone, as there may be other factors influencing the number of accidents, such as the pandemic of COVID-19 and the lockdowns that occurred in 2020 and 2021, which could have reduced the number vehicles on the road and therefore the number of accidents.

Correlation between Total Accidents and Total Rainfall per Month

## [1] "Correlation coefficient: 0.59"

This graph shows the correlation between the total number of accidents and the total rainfall per month in New York in 2021. The result of 0.59 indicates a moderate positive correlation between the two variables, suggesting that higher rainfall may lead to more accidents.

Monthly Accidents and Rainfall

Monthly Number of Accidents by Rainfall Category with Monthly Rainfall

Monthly Number of Accidents by Rainfall Category with Monthly Rainfall

This graph shows the monthly number of accidents in New York in 2021, categorized by rainfall intensity. The black line represents the total number of accidents per month, while the blue bars represent the total rainfall per month on a secondary y-axis. The dots represent the number of accidents in each rainfall category.

It can be appreciated that the number of accidents tends to increase with higher rainfall, especially in the “1 mm - 4 mm (Light rain)” and “>4 mm - 7 mm (Moderate rain)” categories.

At the same time, the majority of accidents occur in no rain conditions, which could be due simply to the fact that most of the time there is no rain in New York. Hence, ithout the a total amount of vehicles on the road, it is difficult to draw conclusions from this data alone.

Chapter of Choice (Esquisse (Web Server App))

If you like Tableau, you will love Esquisse

Esquisse is a package that allows you to create interactive plots and dashboards in R. It is similar to Tableau in that it provides a user-friendly interface for creating visualizations without writing code.

To use Esquisse, you need to install the package and then load it in your R script. After loading the package, you can launch the web app by calling the esquisser() function with your data as an argument. That will open a web browser with the Esquisse interface, where you can create plots interactively by dragging and dropping variables. Esquisse works with the plotly package to create interactive plots.

Esquisse
Esquisse

At the botom left of the pane, in the options tab, you can select to make the plots with plotly to make them interactive. Once active, you can hover over the plots to see the data points and values or click on the legend to filter the data.

Esquisse Options
Esquisse Options